Topic Modeling on Health Journals with Regularized Variational Inference

نویسندگان

  • Robert Giaquinto
  • Arindam Banerjee
چکیده

Topic modeling enables exploration and compact representation of a corpus. The CaringBridge (CB) dataset is a massive collection of journals written by patients and caregivers during a health crisis. Topic modeling on the CB dataset, however, is challenging due to the asynchronous nature of multiple authors writing about their health journeys. To overcome this challenge we introduce the Dynamic Author-Persona topic model (DAP), a probabilistic graphical model designed for temporal corpora with multiple authors. The novelty of the DAP model lies in its representation of authors by a persona — where personas capture the propensity to write about certain topics over time. Further, we present a regularized variational inference algorithm, which we use to encourage the DAP model’s personas to be distinct. Our results show significant improvements over competing topic models — particularly after regularization, and highlight the DAP model’s unique ability to capture common journeys shared by different authors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fast and Effective Framework for Lifelong Topic Model with Self-learning Knowledge

To discover semantically coherent topics from topic models, knowledge-based topic models have been proposed to incorporate prior knowledge into topic models. Moreover, some researchers propose lifelong topic models (LTM) to mine prior knowledge from topics generated from multi-domain corpus without human intervene. LTM incorporates the learned knowledge from multi-domain corpus into topic model...

متن کامل

Incorporating Word Correlation Knowledge into Topic Modeling

This paper studies how to incorporate the external word correlation knowledge to improve the coherence of topic modeling. Existing topic models assume words are generated independently and lack the mechanism to utilize the rich similarity relationships among words to learn coherent topics. To solve this problem, we build a Markov Random Field (MRF) regularized Latent Dirichlet Allocation (LDA) ...

متن کامل

Using Variational Inference and MapReduce to Scale Topic Modeling

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for exploring document collections. Because of the increasing prevalence of large datasets, there is a need to improve the scalability of inference of LDA. In this paper, we propose a technique called MapReduce LDA (Mr. LDA) to accommodate very large corpus collections in the MapReduce framework. In contrast to other techni...

متن کامل

Optimization of Solution Regularized Long-wave Equation by Using Modified Variational Iteration Method

In this paper, a regularized long-wave equation (RLWE) is solved by using the Adomian's decomposition method (ADM) , modified Adomian's decomposition method (MADM), variational iteration method (VIM), modified variational iteration method (MVIM) and homotopy analysis method (HAM). The approximate solution of this equation is calculated in the form of series which its components are computed by ...

متن کامل

Variational Pseudolikelihood for Regularized Ising Inference

I propose a variational approach to maximum pseudolikelihood inference of the Ising model. The variational algorithm is more computationally efficient, and does a better job predicting out-ofsample correlations than L2 regularized maximum pseudolikelihood inference as well as mean field and isolated spin pair approximations with pseudocount regularization. The key to the approach is a variation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1801.04958  شماره 

صفحات  -

تاریخ انتشار 2017